{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<p align=\"center\">\n",
    "    <img src=\"https://github.com/GeostatsGuy/GeostatsPy/blob/master/TCG_color_logo.png?raw=true\" width=\"220\" height=\"240\" />\n",
    "\n",
    "</p>\n",
    "\n",
    "## Interactive Central Limit Theorem (CLT) Demonstration\n",
    "\n",
    "### Michael Pyrcz, Professor, University of Texas at Austin \n",
    "\n",
    "##### [Twitter](https://twitter.com/geostatsguy) | [GitHub](https://github.com/GeostatsGuy) | [Website](http://michaelpyrcz.com) | [GoogleScholar](https://scholar.google.com/citations?user=QVZ20eQAAAAJ&hl=en&oi=ao) | [Book](https://www.amazon.com/Geostatistical-Reservoir-Modeling-Michael-Pyrcz/dp/0199731446) | [YouTube](https://www.youtube.com/channel/UCLqEr-xV-ceHdXXXrTId5ig)  | [LinkedIn](https://www.linkedin.com/in/michael-pyrcz-61a648a1) | [GeostatsPy](https://github.com/GeostatsGuy/GeostatsPy)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "###  Interactive of the Central Limit Theorem\n",
    "\n",
    "Here is an interactive workflow demonstrating the central limit theorem, whereby our features may tend towards Gaussian or log-normal distributions.\n",
    "\n",
    "I have recorded a walk-through of this interactive dashboard in my [Data Science Interactive Python Demonstrations](https://www.youtube.com/playlist?list=PLG19vXLQHvSDy26fM3hDLg3VCU7U5BGZl) series on my [YouTube](https://www.youtube.com/@GeostatsGuyLectures) channel.\n",
    "\n",
    "* Join me for walk-through of this dashboard [08 Data Science Interactive: Central Limit Theorem](TBD). I'm stoked to guide you and share observations and things to try out!   \n",
    "\n",
    "* I have a lecture on [Parametric Distributions](https://www.youtube.com/watch?v=U7fGsqCLPHU&list=PLG19vXLQHvSB-D4XKYieEku9GQMQyAzjJ&index=12) as part of my [Data Analytics and Geostatistics](https://www.youtube.com/playlist?list=PLG19vXLQHvSB-D4XKYieEku9GQMQyAzjJ) course. Note, for all my recorded lecture the interactive and well-documented workflow demononstrations are available on my GitHub repository [GeostatsGuy's Python Numerical Demos](https://github.com/GeostatsGuy/PythonNumericalDemos).\n",
    "\n",
    "* Also, I have a lecture on [Univariate Statistics](https://www.youtube.com/watch?v=wAcbA2cIqec&list=PLG19vXLQHvSB-D4XKYieEku9GQMQyAzjJ&index=10) as part of my [Data Analytics and Geostatistics](https://www.youtube.com/playlist?list=PLG19vXLQHvSB-D4XKYieEku9GQMQyAzjJ) course.\n",
    "\n",
    "* And, I have a lecture on [Monte Carlo Simulation](https://www.youtube.com/watch?v=Qb8TsSINpnU&list=PLG19vXLQHvSB-D4XKYieEku9GQMQyAzjJ&index=13) as part of my [Data Analytics and Geostatistics](https://www.youtube.com/playlist?list=PLG19vXLQHvSB-D4XKYieEku9GQMQyAzjJ) course.\n",
    "\n",
    "* And, I have a lecture on [Statistical Expectation](https://www.youtube.com/watch?v=QVgMt3cPMmM&list=PLG19vXLQHvSB-D4XKYieEku9GQMQyAzjJ&index=11) as part of my [Data Analytics and Geostatistics](https://www.youtube.com/playlist?list=PLG19vXLQHvSB-D4XKYieEku9GQMQyAzjJ) course.\n",
    "\n",
    "* Finally, I have a lecture on [Distribution Transformations](https://www.youtube.com/watch?v=ZDIpE3OkAIU&list=PLG19vXLQHvSB-D4XKYieEku9GQMQyAzjJ&index=14&t=3s) as part of feature engineering in my [Machine Learning](https://www.youtube.com/playlist?list=PLG19vXLQHvSC2ZKFIkgVpI9fCjkN38kwf) course.\n",
    "\n",
    "#### Central Limit Theorem\n",
    "\n",
    "The summation of independent random variables will approach Gaussian distributed as the number of random variables become large:\n",
    "\n",
    "* for any distribution shape for the individual random variables\n",
    "\n",
    "* same for averaging since it is just the application of a scalar to the summation ($\\frac{1}{m}$)\n",
    "\n",
    "#### Now with Mathematical Notation\n",
    "\n",
    "It is convenient to state this with mathematical notation as,\n",
    "\n",
    "\\begin{equation}\n",
    "Y = \\sum_{i=1}^{m \\to \\infty} X_i, \\quad lim_{m \\rightarrow \\infty} Y \\sim N[\\cdot]\n",
    "\\end{equation}\n",
    "\n",
    "\n",
    "where $Y$ is the summation of a large number of independent random variables, $X_i$, where $i = 1,\\ldots,m$.\n",
    "\n",
    "From statistical expectation we can calculate the central tedency as:\n",
    "\n",
    "\\begin{equation}\n",
    "E[Y] = E \\left[\\sum_{i=1}^{m \\to \\infty} X_i \\right] = \\sum_{i=1}^{m \\to \\infty} E \\left[X_i \\right]  \n",
    "\\end{equation}\n",
    "\n",
    "and under the assumption of independence the dispersion as:\n",
    "\n",
    "\\begin{equation}\n",
    "\\sigma_Y^2 = \\sum_{i=1}^{m \\to \\infty} \\sigma_X^2  \n",
    "\\end{equation}\n",
    "\n",
    "therefore, we have our distribution as:\n",
    "\n",
    "\\begin{equation}\n",
    "Y \\sim N \\left[ \\sum_{i=1}^{m \\to \\infty} E[X_i], \\sum_{i=1}^{m \\to \\infty} \\sigma_X^2 \\right]\n",
    "\\end{equation}\n",
    "\n",
    "#### Given All $X_i$ Have the Same Mean and Variance\n",
    "\n",
    "We can simplify by assuming the same mean and variance for all random variables, $X_i$, with central tendency and dispersion:\n",
    "\n",
    "\\begin{equation}\n",
    "E[X_i] = \\mu_X, \\ \\ x = 1,\\ldots,m\n",
    "\\end{equation}\n",
    "\n",
    "\\begin{equation}\n",
    "Var[X_i] = \\sigma^2_X, \\ \\ x = 1,\\ldots,m\n",
    "\\end{equation}\n",
    "\n",
    "now we have:\n",
    "\n",
    "\\begin{equation}\n",
    "Y \\sim N \\left[ m \\cdot \\mu_X, m \\cdot \\sigma_X^2 \\right]\n",
    "\\end{equation}\n",
    "\n",
    "#### Given All $X_i$ Have the Same Mean and Variance for Averages\n",
    "\n",
    "for the case of the average instead of the summation of random variables, $X_i$:\n",
    "\n",
    "\\begin{equation}\n",
    "Y = \\frac{1}{m} \\sum_{i=1}^{m \\to \\infty} X_i\n",
    "\\end{equation}\n",
    "\n",
    "and all with the same central tendency and dispersion the distribution of $Y$ is given by:\n",
    "\n",
    "\\begin{equation}\n",
    "Y \\sim N \\left[ \\mu_X, \\frac{\\sigma^2_X}{m} \\right]\n",
    "\\end{equation}\n",
    "\n",
    "\n",
    "\n",
    "**Monte Carlo Simulation** is a method to sample from a single or set of random variables to assess the emergent distribution. \n",
    "\n",
    "* also known as Monte Carlo methods and Monte Carlo experiments\n",
    "\n",
    "* a method in statistics for generating draws from probability distributions empirically\n",
    "\n",
    "* powerful due to it's broad applicability\n",
    "\n",
    "We can apply Monte Carlo methods ot sample from the sum of our random variables.  We proceed as follows:\n",
    "\n",
    "* for all $X_i$ random variables draw a random value (random realization), $x_i$\n",
    "\n",
    "* calculate the average of the random realizations, $y_i = \\frac{1}{n} \\sum_{i=1}^{n} x_i$\n",
    "\n",
    "* repeat over a large number of samples, $n$, and observe the resulting distribution\n",
    "\n",
    "* compare the experimental CDF to the Gaussian distribution predicted from the Central Limit Theorem\n",
    "\n",
    "assess the uncertainty in a sample statistic by repeated random sampling with replacement.\n",
    "\n",
    "#### Getting Started\n",
    "\n",
    "Here's the steps to get setup in Python with the GeostatsPy package:\n",
    "\n",
    "1. Install Anaconda 3 on your machine (https://www.anaconda.com/download/). \n",
    "\n",
    "That's all!\n",
    "\n",
    "#### Load the Required Libraries\n",
    "\n",
    "We will also need some standard Python packages. These should have been installed with Anaconda 3."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "%matplotlib inline\n",
    "from ipywidgets import interactive                        # widgets and interactivity\n",
    "from ipywidgets import widgets                            \n",
    "from ipywidgets import Layout\n",
    "from ipywidgets import Label\n",
    "from ipywidgets import VBox, HBox\n",
    "import matplotlib.pyplot as plt                           # plotting\n",
    "from matplotlib.ticker import (MultipleLocator, AutoMinorLocator) # control of axes ticks\n",
    "import seaborn as sns                                     # ddf plotting\n",
    "import numpy as np                                        # working with arrays\n",
    "import pandas as pd                                       # working with DataFrames\n",
    "from scipy.stats import triang                            # parametric distributions\n",
    "from scipy.stats import binom\n",
    "from scipy.stats import norm\n",
    "from scipy.stats import uniform\n",
    "from scipy.stats import triang\n",
    "from scipy.stats import probplot                        # visually check univariate Gaussianity\n",
    "from scipy import stats                                   # statistical calculations\n",
    "import random                                             # random drawing / bootstrap realizations of the data\n",
    "import math                                               # square root operator\n",
    "plt.rc('axes', axisbelow=True)                            # set axes and grids in the background for all plots"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "If you get a package import error, you may have to first install some of these packages. This can usually be accomplished by opening up a command window on Windows and then typing 'python -m pip install [package-name]'. More assistance is available with the respective package docs.  \n",
    "\n",
    "#### Declare Functions\n",
    "\n",
    "I just added a convenience function for adding major and minor gridlines."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [],
   "source": [
    "def add_grid():\n",
    "    plt.gca().grid(True, which='major',linewidth = 1.0); plt.gca().grid(True, which='minor',linewidth = 0.2) # add y grids\n",
    "    plt.gca().tick_params(which='major',length=7); plt.gca().tick_params(which='minor', length=4)\n",
    "    plt.gca().xaxis.set_minor_locator(AutoMinorLocator()); plt.gca().yaxis.set_minor_locator(AutoMinorLocator()) # turn on minor ticks   "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Interactive Central Limit Theorem Demonstration\n",
    "\n",
    "Now let's set up our dashboard."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [],
   "source": [
    "# interactive calculation of the sample set (control of source parametric distribution and number of samples)\n",
    "l = widgets.Text(value='                                                Central Limit Theorem Demonstration, Michael Pyrcz, Professor, The University of Texas at Austin',layout=Layout(width='950px', height='30px'))\n",
    "dist = widgets.Dropdown(\n",
    "    options=['Uniform','Triangular','Gaussian'],\n",
    "    value='Uniform',\n",
    "    description='$F_X(x)$',\n",
    "    disabled=False,\n",
    "    layout=Layout(width='200px', height='30px')\n",
    ")\n",
    "mean = widgets.FloatSlider(min=0.0, max = 1.0, value = 0.50, description = 'Mean',orientation='horizontal',layout=Layout(width='400px', height='30px'),continuous_update=False)\n",
    "mean.style.handle_color = 'darkorange'\n",
    "std = widgets.FloatSlider(min=0.01, max = 1.0, value = 0.10, step = 0.05, description = 'St.Dev.',orientation='horizontal',layout=Layout(width='400px', height='30px'),continuous_update=False)\n",
    "std.style.handle_color = 'darkorange'\n",
    "mini = widgets.FloatSlider(min = 0, max = 1.0, value = 0.25, description = 'Min.',orientation='horizontal',layout=Layout(width='400px', height='30px'),continuous_update=False)\n",
    "mini.style.handle_color = 'darkorange'\n",
    "maxi = widgets.FloatSlider(min = 0, max = 1.0, value = 0.75, description = 'Max.',orientation='horizontal',layout=Layout(width='400px', height='30px'),continuous_update=False)\n",
    "maxi.style.handle_color = 'darkorange'\n",
    "m = widgets.IntSlider(min = 1, max = 20, value = 1, description = '$m$ ',orientation='horizontal',layout=Layout(width='400px', height='30px'),continuous_update=False)\n",
    "m.style.handle_color = 'darkorange'\n",
    "L = widgets.IntSlider(min = 1, max = 1000, value = 100, description = '$L$ ',orientation='horizontal',layout=Layout(width='400px', height='30px'),continuous_update=False)\n",
    "L.style.handle_color = 'darkorange'\n",
    "\n",
    "def update_dashboard(*args):\n",
    "    if dist.value == 'Gaussian':\n",
    "        mini.layout.display = \"none\"\n",
    "        maxi.layout.display = \"none\"\n",
    "        mean.layout.display = \"flex\"\n",
    "        mean.description = 'Mean'\n",
    "        std.layout.display = \"flex\"\n",
    "    elif dist.value == \"Uniform\":\n",
    "        mean.layout.display = \"none\"\n",
    "        std.layout.display = \"none\"\n",
    "        mini.layout.display = \"flex\"\n",
    "        maxi.layout.display = \"flex\"\n",
    "    elif dist.value == \"Triangular\":\n",
    "        mean.layout.display = \"flex\"\n",
    "        mean.description = 'Mode'\n",
    "        std.layout.display = \"none\"\n",
    "        mini.layout.display = \"flex\"\n",
    "        maxi.layout.display = \"flex\"\n",
    "        \n",
    "dist.observe(update_dashboard,'value'); update_dashboard()\n",
    "\n",
    "uia = widgets.HBox([dist,m,L],)                   # basic widget formatting \n",
    "uib = widgets.HBox([mini,mean,std,maxi],)                   # basic widget formatting \n",
    "ui2 = widgets.VBox([l,uia,uib],)\n",
    "\n",
    "def f_make(dist,mean,std,mini,maxi, m, L):                       # function to take parameters, make sample and plot\n",
    "    \n",
    "    dataset = make_average_data(dist,mean,std,mini,maxi,m,L)\n",
    "\n",
    "    plt.subplot(221)\n",
    "    #sample = make_average_data(dist,mean,std,mini,maxi, m=1, L=10000)\n",
    "    pdf = np.zeros(1000); pdf_avg_norm = np.zeros(1000)\n",
    "    if dist == 'Uniform':\n",
    "        minv = mini - (maxi-mini); maxv = maxi + (maxi-mini)\n",
    "        pdf = stats.uniform.pdf(np.linspace(minv,maxv,1000),loc = mini,scale = maxi - mini)* 2 * L \n",
    "    elif dist == 'Triangular':\n",
    "        minv = mini; maxv = maxi\n",
    "        pdf = stats.triang.pdf(np.linspace(minv,maxv,1000),loc = mini,c = (mean-mini)/(maxi-mini), scale = maxi-mini)* 2 * L\n",
    "    elif dist == 'Gaussian':    \n",
    "        minv = mean - 4*std; maxv = mean + 4*std \n",
    "        pdf = stats.norm.pdf(np.linspace(minv,maxv,1000),loc = mean, scale = std)* 2 * L\n",
    "     \n",
    "    #bins = np.linspace(minv,maxv,50)\n",
    "    bins = np.linspace(0,1,50)\n",
    "    #plt.hist(sample,alpha=0.6,color=\"darkorange\",edgecolor=\"black\",bins=bins)\n",
    "    plt.plot(np.linspace(minv,maxv,1000),pdf,'--',color='black',linewidth = 3,zorder=10)\n",
    "    plt.fill_between(np.linspace(minv,maxv,1000),pdf,np.full(1000,0),color='darkorange',alpha=0.6,zorder=1)\n",
    "    #plt.xlim(minv,maxv)\n",
    "    plt.xlim([0,1.0])\n",
    "    plt.ylim(bottom=0); plt.title(r'$f_{X_i}(x)$'); plt.ylabel('Density'); plt.xlabel('$f_{X_i}(x)$'); add_grid()\n",
    "    \n",
    "    plt.subplot(222) \n",
    "    plt.hist(dataset,alpha=0.6,color=\"darkorange\",edgecolor=\"black\",bins=bins,density = True,label='Monte Carlo') \n",
    "    pdf_avg_norm = stats.norm.pdf(np.linspace(minv,maxv,1000),loc = np.average(dataset), scale = np.std(dataset))* 2 * L\n",
    "    plt.plot(np.linspace(minv,maxv,1000),pdf_avg_norm/(2*L),color='black',linewidth = 2,label='Gaussian')\n",
    "    sns.kdeplot(x=dataset,color = 'black',alpha = 0.8,levels = 1,bw_method=0.1,linewidth = 2,dashes = (5,2,1,2),label='Monte Carlo')\n",
    "    ymin,ymax = plt.gca().get_ylim()\n",
    "    plt.xlim(minv,maxv); plt.title(r'$f_Y$ vs. Gaussian PDF'); plt.ylabel('Frequency'); plt.xlabel(r'$Y$')\n",
    "    plt.legend(loc='upper left')\n",
    "    plt.ylim([0,ymax]); plt.xlim([0,1.0]); add_grid()\n",
    "    plt.annotate(r'$Y= \\frac{1}{m} \\sum_{i=1}^m X_i$',xy=[0.05,ymax*0.75],size=14)\n",
    "    \n",
    "    plt.subplot(223)    \n",
    "    plt.hist(dataset,cumulative = True, density = True, alpha=0.6,color=\"darkorange\",edgecolor=\"black\", bins = bins, label = 'Monte Carlo')\n",
    "    #plt.xlim(minv,maxv); \n",
    "    plt.xlim([0,1.0])\n",
    "    plt.ylim(0,1); plt.title(r'$F_Y$ vs. Gaussian CDF'); plt.xlabel(r'$Y$'); plt.ylabel('Cumulative Probability') \n",
    "    cumul_prob = np.linspace(0.0,1.0,100)\n",
    "    prop_values = norm.ppf(cumul_prob) \n",
    "    prop_values = prop_values * np.std(dataset) + np.average(dataset) \n",
    "    plt.plot(prop_values, cumul_prob, color = 'black',linewidth = 2,label = 'Gaussian')\n",
    "    plt.plot(np.percentile(dataset,np.linspace(0.0,100.0,100)),np.linspace(0.0,100.0,100)/100,color='black',linewidth = 2, dashes = (5,2,1,2),label='Monte Carlo')\n",
    "    plt.legend(loc='upper left'); add_grid()\n",
    "    plt.annotate(r'$Y= \\frac{1}{m} \\sum_{i=1}^m X_i$',[0.05,0.75],size=14)\n",
    "    \n",
    "    plot4 = plt.subplot(224) \n",
    "    probplot(dataset,plot=plot4)\n",
    "    plot4.get_lines()[0].set_marker('p'); plot4.get_lines()[0].set_markerfacecolor('darkorange')\n",
    "    plot4.get_lines()[0].set_markeredgecolor('black'); plot4.get_lines()[1].set_color('black')\n",
    "    plt.title('Normal Probability Plot $Y$')\n",
    "    plt.xlim([-3,3]); plt.ylim(0,1); add_grid(); plt.ylabel(r'Ordered $Y$')\n",
    "    #plt.ylim([minv,maxv])\n",
    "        \n",
    "    plt.subplots_adjust(left=0.0, bottom=0.0, right=2.0, top=2.2, wspace=0.3, hspace=0.2)\n",
    "    plt.show()\n",
    "\n",
    "def make_average_data(dist,mean,std,mini,maxi, m, L):            # function to check parameters and make samples \n",
    "    average = 0.0; stdev = 0.0\n",
    "    null_data = np.zeros(L)\n",
    "    if dist == 'Uniform':\n",
    "        if mini >= maxi:\n",
    "            print('Invalid uniform distribution parameters')\n",
    "            return null_data\n",
    "        dataset = np.mean(uniform.rvs(size=[m,L], loc = mini, scale = maxi-mini, random_state = 73073),axis = 0)\n",
    "        return dataset\n",
    "    elif dist == 'Triangular':\n",
    "        if mean <= mini or mean >= maxi or (maxi - mini) <= 0:\n",
    "            print('Invalid triangular distribution parameters')\n",
    "            return null_data       \n",
    "        dataset = np.mean(triang.rvs(size=[m,L], loc = mini, c = (mean-mini)/(maxi-mini), scale = (maxi-mini), random_state = 73073), axis = 0)\n",
    "        return dataset\n",
    "    elif dist == 'Gaussian':\n",
    "        dataset = np.mean(norm.rvs(size=[m,L], loc = mean, scale = std, random_state = 73073), axis = 0)\n",
    "        return dataset\n",
    "\n",
    "# connect the function to make the samples and plot to the widgets    \n",
    "interactive_plot = widgets.interactive_output(f_make, {'dist': dist,'mean':mean,'std':std,'mini':mini,'maxi':maxi,'m': m,'L':L})\n",
    "interactive_plot.clear_output(wait = True)               # reduce flickering by delaying plot updating"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Central Limit Theorem Demonstration\n",
    "\n",
    "#### Michael Pyrcz, Professor, The University of Texas at Austin \n",
    "\n",
    "##### [Twitter](https://twitter.com/geostatsguy) | [GitHub](https://github.com/GeostatsGuy) | [Website](http://michaelpyrcz.com) | [GoogleScholar](https://scholar.google.com/citations?user=QVZ20eQAAAAJ&hl=en&oi=ao) | [Book](https://www.amazon.com/Geostatistical-Reservoir-Modeling-Michael-Pyrcz/dp/0199731446) | [YouTube](https://www.youtube.com/channel/UCLqEr-xV-ceHdXXXrTId5ig)  | [LinkedIn](https://www.linkedin.com/in/michael-pyrcz-61a648a1) | [GeostatsPy](https://github.com/GeostatsGuy/GeostatsPy)\n",
    "\n",
    "### The Problem\n",
    "\n",
    "The summation of $m$ random variables (of any distribution type), $Y = \\frac{1}{m}\\sum_{i=1}^m X_i$, will tend to Gaussian, $Y \\sim \\mathcal{N}(\\mu,\\,\\frac{\\sigma^{2}}{m})$, as $m \\rightarrow \\infty$. The dashboard inputs:\n",
    "\n",
    "* $F_{X_i}(x)$: parametric distribution and associated parameters, Minimum, Maximum, Mean, Mode, or Standard Deviation.\n",
    "\n",
    "* $m$: number of random variables, $L$: number of realizations"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {
    "scrolled": false
   },
   "outputs": [
    {
     "data": {
      "application/vnd.jupyter.widget-view+json": {
       "model_id": "fddda959efd64512b0a6517f07d4ebf2",
       "version_major": 2,
       "version_minor": 0
      },
      "text/plain": [
       "VBox(children=(Text(value='                                                Central Limit Theorem Demonstration…"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "application/vnd.jupyter.widget-view+json": {
       "model_id": "e2001d5abe6b4aaebd27b020cfe34114",
       "version_major": 2,
       "version_minor": 0
      },
      "text/plain": [
       "Output(outputs=({'output_type': 'display_data', 'data': {'text/plain': '<Figure size 432x288 with 4 Axes>', 'i…"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "display(ui2, interactive_plot)                            # display the interactive plot"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Observations\n",
    "\n",
    "Some observations:\n",
    "\n",
    "* if $X_i$ are Gaussian, $Y$ is Gaussian distributed for any $m$\n",
    "\n",
    "* if $X_i$ are uniform then convergence occurs over 5 or more RVs averaged, it happens quickly for small n's.\n",
    "\n",
    "* if $X_i$ are triangular distribution, $Y$ converges to Gaussian faster than if $X_i$ are uniform\n",
    "\n",
    "#### Comments\n",
    "\n",
    "This was an interactive demonstration of the the central limit theorem (CLT). I'm providing students an opportunity to play with data analytics, geostatistics and machine learning for experiential learning.\n",
    "  \n",
    "I hope this was helpful,\n",
    "\n",
    "*Michael*\n",
    "\n",
    "#### The Author:\n",
    "\n",
    "### Michael Pyrcz, Professor, University of Texas at Austin \n",
    "*Novel Data Analytics, Geostatistics and Machine Learning Subsurface Solutions*\n",
    "\n",
    "With over 17 years of experience in subsurface consulting, research and development, Michael has returned to academia driven by his passion for teaching and enthusiasm for enhancing engineers' and geoscientists' impact in subsurface resource development. \n",
    "\n",
    "For more about Michael check out these links:\n",
    "\n",
    "#### [Twitter](https://twitter.com/geostatsguy) | [GitHub](https://github.com/GeostatsGuy) | [Website](http://michaelpyrcz.com) | [GoogleScholar](https://scholar.google.com/citations?user=QVZ20eQAAAAJ&hl=en&oi=ao) | [Book](https://www.amazon.com/Geostatistical-Reservoir-Modeling-Michael-Pyrcz/dp/0199731446) | [YouTube](https://www.youtube.com/channel/UCLqEr-xV-ceHdXXXrTId5ig)  | [LinkedIn](https://www.linkedin.com/in/michael-pyrcz-61a648a1)\n",
    "\n",
    "#### Want to Work Together?\n",
    "\n",
    "I hope this content is helpful to those that want to learn more about subsurface modeling, data analytics and machine learning. Students and working professionals are welcome to participate.\n",
    "\n",
    "* Want to invite me to visit your company for training, mentoring, project review, workflow design and / or consulting? I'd be happy to drop by and work with you! \n",
    "\n",
    "* Interested in partnering, supporting my graduate student research or my Subsurface Data Analytics and Machine Learning consortium (co-PIs including Profs. Foster, Torres-Verdin and van Oort)? My research combines data analytics, stochastic modeling and machine learning theory with practice to develop novel methods and workflows to add value. We are solving challenging subsurface problems!\n",
    "\n",
    "* I can be reached at mpyrcz@austin.utexas.edu.\n",
    "\n",
    "I'm always happy to discuss,\n",
    "\n",
    "*Michael*\n",
    "\n",
    "Michael Pyrcz, Ph.D., P.Eng. Professor, Cockrell School of Engineering and The Jackson School of Geosciences, The University of Texas at Austin\n",
    "\n",
    "#### More Resources Available at: [Twitter](https://twitter.com/geostatsguy) | [GitHub](https://github.com/GeostatsGuy) | [Website](http://michaelpyrcz.com) | [GoogleScholar](https://scholar.google.com/citations?user=QVZ20eQAAAAJ&hl=en&oi=ao) | [Book](https://www.amazon.com/Geostatistical-Reservoir-Modeling-Michael-Pyrcz/dp/0199731446) | [YouTube](https://www.youtube.com/channel/UCLqEr-xV-ceHdXXXrTId5ig)  | [LinkedIn](https://www.linkedin.com/in/michael-pyrcz-61a648a1)\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.9.12"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
